A pilot investigation of information extraction in the semantic annotation of archaeological reports

نویسندگان

Andreas Vlachidis

Douglas Tudhope

چکیده

The paper discusses a prototype investigation of semantic annotation, a form of metadata assigning conceptual entities to textual instances, in this case archaeological grey literature. The use of Information Extraction (IE), a Natural Language Processing (NLP) technique, is central to the annotation process while the use of Knowledge Organization System (KOS) is explored for the association of semantic annotation with both ontological and terminological references. The annotation process follows a rule-based information extraction approach using the GATE NLP toolkit, together with the CIDOC CRM ontology, its CRM-EH archaeological extension and English Heritage thesauri and glossaries. Results are reported from an initial evaluation, which suggest that these information extraction techniques can be applied to archaeological grey literature reports. Further work is discussed drawing on the evaluation and consideration of the characteristics of the archaeology domain.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantic Annotation for Indexing Archaeological Context: A Prototype Development and Evaluation

The paper discusses the process of developing Semantic Annotations, a form of metadata for assigning conceptual entities to textual instances, in this case archaeological grey literature. The use of Information Extraction (IE), a Natural Language Processing (NLP) technique is central to the annotation process. The paper explores the use of Ontology Oriented Information Extraction (OOIE) methods...

متن کامل

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

Semantic-Based Image Retrial in the VQ Compressed Domain using Image Annotation Statistical Models

متن کامل

A Pilot Study on the Semantic Classification of Two German Prepositions: Combining Monolingual and Multilingual Evidence

This paper reports on the annotation and maximum-entropy modeling of the semantics of two German prepositions, mit (‘with’) and auf (‘on’). 500 occurrences of each preposition were sampled from a treebank and annotated with syntactosemantic classes by two annotators. The classification is guided by a perspective of information extraction, relies on linguistic tests and aims at the separation of...

متن کامل

Some Remarks on Automatic Semantic Annotation of a Medical Corpus

In this paper we present arguments that elaborating a rule based information extraction system is a good starting point for obtaining a semantic annotated corpus of medical data. Our claim is supported by evaluation results of the automatic annotation of a corpus containing hospital discharge reports of diabetic patients.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

IJMSO

دوره 7 شماره

صفحات -

تاریخ انتشار 2012

A pilot investigation of information extraction in the semantic annotation of archaeological reports

نویسندگان

چکیده

منابع مشابه

Semantic Annotation for Indexing Archaeological Context: A Prototype Development and Evaluation

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Semantic-Based Image Retrial in the VQ Compressed Domain using Image Annotation Statistical Models

A Pilot Study on the Semantic Classification of Two German Prepositions: Combining Monolingual and Multilingual Evidence

Some Remarks on Automatic Semantic Annotation of a Medical Corpus

عنوان ژورنال:

اشتراک گذاری